You are looking at historical revision 8398 of this page. It may differ significantly from its current revision.
Unit regex
This library unit provides support for regular expressions. The regular expression package used is PCRE (Perl Compatible Regular Expressions) written by Philip Hazel. See http://www.pcre.org for information about the particular regexp flavor and extensions provided by this library.
To test that PCRE support has been built into Chicken properly, try:
(require 'regex) (test-feature? 'pcre) => #t
grep
[procedure] (grep REGEX LIST)
Returns all items of LIST that match the regular expression REGEX. This procedure could be defined as follows:
(define (grep regex lst)
(filter (lambda (x) (string-search regex x)) lst) )
glob->regexp
[procedure] (glob->regexp PATTERN)
Converts the file-pattern PATTERN into a regular expression.
(glob->regexp "foo.*") => "foo\..*"
PATTERN should follow "glob" syntax. Allowed wildcards are
* [C...] [C1-C2] [-C...] ?
glob?
[procedure] (glob? STRING)
Does the STRING have any "glob" wildcards?
A string without any "glob" wildcards does not meet the criteria, even though it technically is a valid "glob" file-pattern.
regex-chardef-table?
[procedure] (regex-chardef-table? OBJECT)
Returns #t if the OBJECT is a character definitions table, and #f otherwise.
regex-chardef-table
[procedure] (regex-chardef-table)
Returns a new character definitions table.
regexp
[procedure] (regexp STRING [IGNORECASE [IGNORESPACE [UTF8]]])
Returns a precompiled regular expression object for string. The optional arguments IGNORECASE, IGNORESPACE and UTF8 specify whether the regular expression should be matched with case- or whitespace-differences ignored, or whether the string should be treated as containing UTF-8 encoded characters, respectively.
regexp*
[procedure] (regexp* STRING [OPTIONS [CHARDEFS-TABLE]])
Returns a precompiled regular expression object for string. The optional argument OPTIONS must be a list of option symbols. The optional argument CHARDEFS-TABLE must be a character definitions table.
Option Symbols:
- caseless
- Character case insensitive match
- multiline
- Equivalent to Perl's /m option
- dotall
- Equivalent to Perl's /s option
- extended
- Ignore whitespace
- anchored
- Anchor pattern match
- dollar-endonly
- `$' metacharacter in the pattern matches only at the end of the subject string
- extra
- Currently of very little use
- notbol
- First character of the string is not the beginning of a line
- noteol
- End of the string is not the end of a line
- ungreedy
- Inverts the "greediness" of the quantifiers so that they are not greedy by default
- notempty
- The empty string is not considered to be a valid match
- utf8
- UTF-8 encoded characters
- no-auto-capture
- Disables the use of numbered capturing parentheses
- no-utf8-check
- Skip valid UTF-8 sequence check
- auto-callout
- Automatically inserts callout items (not defined here)
- partial
- Partial match ok
- firstline
- An unanchored pattern is required to match before or at the first newline
- dupnames
- Names used to identify capturing subpatterns need not be unique
- newline-cr
- Newline definition is `\r'
- newline-lf
- Newline definition is `\n'
- newline-crlf
- Newline definition is `\r\n'
- newline-anycrlf
- Newline definition is any of `\r', `\n', or `\r\n'
- newline-any
- ewline definition is any Unicode newline sequence
- bsr-anycrlf
- `\R' escape sequence matches only CR, LF, or CRLF
- bsr-unicode
- `\R' escape sequence matches only Unicode newline sequence
- dfa-shortest
- Currently unused
- dfa-restart
- Currently unused
regexp?
[procedure] (regexp? X)
Returns #t if X is a precompiled regular expression, or #f otherwise.
regexp-optimize
[procedure] (regexp-optimize RX)
Perform available optimizations for the precompiled regular expression RX. Returns #t when optimization performed, and #f otherwise.
string-match
string-match-positions
[procedure] (string-match REGEXP STRING [START]) [procedure] (string-match-positions REGEXP STRING [START])
Matches the regular expression in REGEXP (a string or a precompiled regular expression) with STRING and returns either #f if the match failed, or a list of matching groups, where the first element is the complete match. If the optional argument START is supplied, it specifies the starting position in STRING. For each matching group the result-list contains either: #f for a non-matching but optional group; a list of start- and end-position of the match in STRING (in the case of string-match-positions); or the matching substring (in the case of string-match). Note that the exact string is matched. For searching a pattern inside a string, see below. Note also that string-match is implemented by calling string-search with the regular expression wrapped in ^ ... $. If invoked with a precompiled regular expression argument (by using regexp), string-match is identical to string-search.
string-search
string-search-positions
[procedure] (string-search REGEXP STRING [START [RANGE]]) [procedure] (string-search-positions REGEXP STRING [START [RANGE]])
Searches for the first match of the regular expression in REGEXP with STRING. The search can be limited to RANGE characters.
string-split-fields
[procedure] (string-split-fields REGEXP STRING [MODE [START]])
Splits STRING into a list of fields according to MODE, where MODE can be the keyword #:infix (REGEXP matches field separator), the keyword #:suffix (REGEXP matches field terminator) or #t (REGEXP matches field), which is the default.
(define s "this is a string 1, 2, 3,") (string-split-fields "[^ ]+" s) => ("this" "is" "a" "string" "1," "2," "3,") (string-split-fields " " s #:infix) => ("this" "is" "a" "string" "1," "2," "3,") (string-split-fields "," s #:suffix) => ("this is a string 1" " 2" " 3")
string-substitute
[procedure] (string-substitute REGEXP SUBST STRING [MODE])
Searches substrings in STRING that match REGEXP and substitutes them with the string SUBST. The substitution can contain references to subexpressions in REGEXP with the \NUM notation, where NUM refers to the NUMth parenthesized expression. The optional argument MODE defaults to 1 and specifies the number of the match to be substituted. Any non-numeric index specifies that all matches are to be substituted.
(string-substitute "([0-9]+) (eggs|chicks)" "\\2 (\\1)" "99 eggs or 99 chicks" 2) => "99 eggs or chicks (99)"
Note that a regular expression that matches an empty string will signal an error.
string-substitute*
[procedure] (string-substitute* STRING SMAP [MODE])
Substitutes elements of STRING with string-substitute according to SMAP. SMAP should be an association-list where each element of the list is a pair of the form (MATCH . REPLACEMENT). Every occurrence of the regular expression MATCH in STRING will be replaced by the string REPLACEMENT
(string-substitute* "<h1>Hello, world!</h1>" '(("<[/A-Za-z0-9]+>" . ""))) => "Hello, world!"
regexp-escape
[procedure] (regexp-escape STRING)
Escapes all special characters in STRING with \, so that the string can be embedded into a regular expression.
(regexp-escape "^[0-9]+:.*$") => "\\^\\[0-9\\]\\+:.\n.\\*\\$"
make-anchored-pattern
[procedure] (make-anchored-pattern REGEXP [WITHOUT-BOL [WITHOUT-EOL]])
Makes an anchored pattern from REGEXP (a string or a precompiled regular expression) and returns the updated pattern. When WITHOUT-BOL is #t the beginning-of-line anchor is not added. When WITHOUT-EOL is #t the end-of-line anchor is not added.
The WITHOUT-BOL and {WITHOUT-EOL}} arguments are ignored for a precompiled regular expression.
Previous: Unit match
Next: Unit srfi-18